Видео с ютуба Cuda Inference
Nvidia CUDA in 100 Seconds
Nvidia CUDA vs Apple Metal for AI Work
Запуск ИИ на FreeBSD (проблема CUDA)
Maximize LLM Inference Performance + Auto-Profile/Optimize PyTorch/CUDA Code
Оптимизация инференса LLM: асинхронный непрерывный батчинг с использованием CUDA Streams
Analyzing Deepseek's "undefined" NVIDIA PTX optimizations (with benchmarks!)
CUDA Programming Course – High-Performance Computing with GPUs
How a GPU Actually Works (and Powers AI)
FASTER Inference with Torch TensorRT Deep Learning for Beginners - CPU vs CUDA
CUDA vs Triton - Which GPU Programming Tool Is Better in 2026?
Must Know Technique in GPU Computing | Episode 4: Tiled Matrix Multiplication in CUDA C
Understanding the LLM Inference Workload - Mark Moyou, NVIDIA
Освоение оптимизации вывода LLM: от теории до экономически эффективного внедрения: Марк Мойу
Demo - Chatbot Response Acceleration with CUDA LLM Inference
Dual RTX 5090s Destroy AI Benchmarks Ollama, CUDA Burn & 34B Model
How Much GPU Memory is Needed for LLM Inference?
Stillwaters AI - LLM Systems Engineering | Inference, CUDA Memory, Tensor Cores, Observability, HPC
Техническая сторона инференса LLM: взгляд внутрь GPU
AIKit - Running Inference with CUDA
Parallel Computing with Nvidia CUDA